Segmentation of prosodic phrases for improving the naturalness of synthesized Mandarin Chinese speech

نویسندگان

  • Zhengyu Niu
  • Peiqi Chai
چکیده

It is noticed that i n natural speech sentences are breaked into breath groups. Some words seem to be more closely grouped with adjacent words: we call these groups prosodic phrases. In order to improve the naturalness of synthesized speech, prosodic processing in both text-processing component and speech generation component is needed. The text-processing component is more important because the performance of speech generation component is dependent on the ability of the previous one. This paper discussed how to break sentences into prosodic phrases. At first, for segmentation of prosodic phrases, the text is segmented into Chinese words. Then these words are annotated with an automatic Part-of-Speech tagger. Adjacent words which have close syntactic relation are grouped to form prosodic phrases using the POS tags and syntactic phrase structure information. When breaking prosodic phrases other factors must be taken into consideration, such as speech velocity, pragmatic knowledge, the context, and the speaker's feeling. The POS tagging algorithm is based on integration of the statistical method and rule method.2-Gram Markov language model is used in the algorithm. The most likely POS sequence for a given sentence is found by searching through the language model and picking the most likely path. Then the rule method is used to correct the errors caused by statistical method, which identifies a word's category using context information. Through experiments the tagger correctly tagged 94% of words in an independent test set of 1.2 thousand Chinese characters. Based on rules, the lexical information and phrase structure information will be used to form prosodic phrases. Through experiments we obtained a break-correct figure of 86% and a recall rate of 90%. After segmentation of prosodic phrases, these grouped words are read continuously when the text is converted to speech. And the naturalness of synthesized speech is improved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prosody generation in Chinese synthesis using the template of quantified prosodic unit and base intonation contour

This paper presents a prosody generation method for Chinese mandarin using the template of quantified prosodic unit and base intonation contour. This method uses the prosodic feature picked-up from the syllables in the prosody words by rule as the base unit, and integrates the prosody rules in the prosody words of Chinese mandarin and base intonation contour to achieve the prosody contours with...

متن کامل

Prosodic Alternative Units in a Mandarin Chinese Speech Synthesizer

The Mandarin Chinese synthesis component of the Dresden Speech Synthesizer DreSS is based on an inventory of syllabic units. The inventory contains all Chinese syllables with the possible tones in up to three phonetic variations for a correct modeling of the cross syllable coarticulation effects. In order to improve the naturalness and fluency of the synthesized speech, the inventory was comple...

متن کامل

Multi-strategy data mining on Mandarin prosodic patterns

Mandarin prosodic models are very important in speech research and synthesis, which mainly describes the variation of pitch. The models that are now being used in most Chinese Text-To-Speech systems are constructed by expert, qualitatively and with low precision. In this paper, we propose a Multi-strategy Data Mining framework to extract prosodic patterns from actual large Mandarin speech datab...

متن کامل

Prosodic Boundary Prediction Based on Maximum Entropy Model with Error-Driven Modification1

Prosodic boundary prediction is the key to improving the intelligibility and naturalness of synthetic speech for a TTS system. This paper investigated the problem of automatic segmentation of prosodic word and prosodic phrase, which are two fundamental layers in the hierarchical prosodic structure of Mandarin Chinese. Maximum Entropy (ME) Model was used at the front end for both prosodic word a...

متن کامل

Prosodic Boundary Prediction Based on Maximum Entropy Model with Error-Driven Modification

Prosodic boundary prediction is the key to improving the intelligibility and naturalness of synthetic speech for a TTS system. This paper investigated the problem of automatic segmentation of prosodic word and prosodic phrase, which are two fundamental layers in the hierarchical prosodic structure of Mandarin Chinese. Maximum Entropy (ME) Model was used at the front end for both prosodic word a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000